Method for processing electronic data

ABSTRACT

A method for processing electronic data includes the steps of transforming the electronic data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; and processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices.

TECHNICAL FIELD

The present invention relates to a method for processing electronic data, and particularly, although not exclusively, to a method for obtaining low-rank representation of signals or data near a low-dimensional subspace and data reconstruction.

BACKGROUND

Electronic data stored in electronic storage medium may be analysed using different processor, including digital processor, which is designed for processing digital source data in the digital domain.

For example, digital images may be processed by a digital signal processor which may efficiently analyse and process the digital image, and the processor may further process the image by further modifying the image data using certain algorithm or digital filter. Alternatively, other types of digital data, such as data related to dimensionality reduction, computer vision, machine learning, pattern classification and data analytics, may also be analysed or processed using digital methods.

SUMMARY OF THE INVENTION

In accordance with a first aspect of the present invention, there is provided a method for processing electronic data, comprising the steps of: transforming the electronic data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; and processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices.

In an embodiment of the first aspect, the approximation process includes a greedy pursuit approximation process.

In an embodiment of the first aspect, the approximation process includes a linear regression process.

In an embodiment of the first aspect, the approximation process includes determining a best rank-one approximation of a plurality of residual matrices in each of a plurality of iteration steps in the approximation process.

In an embodiment of the first aspect, the approximation process further includes subtracting a plurality of rank-one basis matrices from the plurality of residual matrices.

In an embodiment of the first aspect, the method further comprises the step of representing a solution associated with the matrix representation and the K residual matrices as (u_(i),v_(i),s^(i)) and {R_(k) ^(i)}_(k=1) ^(K) at the ith iteration step.

In an embodiment of the first aspect, in the ith iteration step, the rank-one approximation of {R_(k) ^(i−1)}_(k=1) ^(K) is determined.

In an embodiment of the first aspect, the method further comprises the step of determining the coefficients {s_(k,l)} of the matrix representation (u_(i),v_(i),s^(i)) at the ith iteration.

In an embodiment of the first aspect, the coefficients {s_(k,l)} are determined by a least squares method.

In an embodiment of the first aspect, the coefficients {s_(k,l)} are determined after obtaining {u_(l)}_(l=1) ^(i) and {v_(l)}_(l=1) ^(i) at the ith iteration.

In an embodiment of the first aspect, the approximation process includes an orthogonal greedy pursuit approximation process.

In an embodiment of the first aspect, the approximation process further includes a rank-one fitting process for obtaining the low-rank approximation of the plurality of matrices.

In an embodiment of the first aspect, the low-rank approximation of the plurality of matrices are determined to be a principal singular value of each of the plurality of matrices having a maximum spectral norm.

In an embodiment of the first aspect, the approximation process includes an economic greedy pursuit approximation process.

In an embodiment of the first aspect, the step of decomposing the matrix representation into a series of matrix approximations includes a non-orthogonal decomposition of the plurality of matrices.

In an embodiment of the first aspect, the non-orthogonal decomposition includes a joint diagonal decomposition.

In an embodiment of the first aspect, the electronic data include at least one of digital image data, textual data, audio data, and financial data.

In an embodiment of the first aspect, the electronic data are digitally compressed.

In an embodiment of the first aspect, the method further comprises the step of constructing a data representation of the electronic data being processed based on the low-rank approximation of the plurality of matrices.

In an embodiment of the first aspect, processing of electronic data includes image reconstruction of digital image with noise and/or defect.

In an embodiment of the first aspect, processing of electronic data includes a feature extraction process for classification of the electronic data.

In accordance with a second aspect of the present invention, there is provided a method for processing digital image data, comprising the steps of: transforming the digital image data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices; and constructing at least one digital image based on the low-rank approximation of the plurality of matrices.

In an embodiment of the second aspect, the digital image data represent at least one source image with noise and/or defect.

In an embodiment of the second aspect, the approximation process includes a greedy pursuit approximation process.

In an embodiment of the second aspect, the step of decomposing the matrix representation into a series of matrix approximations includes a non-orthogonal decomposition of the plurality of matrices.

In an embodiment of the second aspect, the non-orthogonal decomposition includes a joint diagonal decomposition.

In an embodiment of the second aspect, the method further comprises the step of processing the digital image data using a feature extraction process for classification of the digital image data.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings in which:

FIG. 1 is a schematic diagram of a computing server for operation as a system for processing electronic data in accordance with one embodiment of the present invention;

FIG. 2 is a plot showing normalized reconstruction error (NRE) versus iteration number using random data in noise-free case involved in the method used in accordance with one embodiment of the present invention;

FIG. 3 is a plot showing NRE versus iteration number using random data in Gaussian noise at SNRs of 10, 20, and 30 dB involved in the method used in accordance with one embodiment of the present invention;

FIG. 4 are plots showing NRE versus iteration number for ORL face, Georgia Tech face, MNIST handwritten digit, and COIL-20 databases;

FIG. 5 is a combined image showing samples of original, noisy, and reconstructed images for ORL face database;

FIG. 6 is a combined image showing samples of original, noisy, and reconstructed images on Georgia Tech face database;

FIG. 7 is a combined image showing samples of original, noisy, and reconstructed images on MNIST database of handwritten digits;

FIG. 8 is a combined image showing samples of original, noisy, and reconstructed images for COIL-20 database of object;

FIG. 9 are plots showing NREs of PCA, RPCA, 2DPCA, GLARM, EGP, GP and OGP method versus compression ratio for four example databases; and

FIG. 10 is a plot showing recognition rate versus compression ratio for ORL face database.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The inventors have, through their own research, trials and experiments, devised that, various kind of data, such as electronic data, may be approximated by matrices whose ranks are much smaller than their column and row lengths. The low-rank matrix approximation may extract the low-dimensional subspaces or principal components of the matrices constructed from these signals.

In some example embodiments, low-rank approximation may be applied in computer vision, pattern classification, machine learning, and data analytics. In these examples, signals or data such as textual, visual, audio and financial data lie near some low-dimensional subspace.

For example, principal component analysis (PCA), which may be realized by the truncated singular value decomposition (SVD), may be applied in processing the data. However, it is found that PCA and SVD may only deal with a single matrix and fail to capture the low-rank components when the observations contain outliers.

The inventors noted that to deal with multiple matrices without vectorization, a two-dimensional PCA (2DPCA) may be applied to directly transform the original matrices into ones with lower column number. However, the 2DPCA merely reduces the column size while the row size remains unchanged because a single-sided transform is adopted, implying limited compression capability.

The generalized low rank approximations of matrices (GLRAM) applies a double-sided transform to reduce both row and column sizes, which considerably improves compression capability. Under the same compression ratio, the GLRAM achieves smaller reconstruction error than the 2DPCA.

Therefore, the inventors have devised a joint low-rank approximation of multiple matrices (LRAMM). As in the 2DPCA and GLRAM, the LRAMM also does not convert matrices into vectors and thus avoids processing the matrix with much larger size. Different from the GLRAM, the LRAMM achieves a non-orthogonal but joint diagonal factorization of multiple matrices, which leads to a more compact representation and a higher compression ratio.

In one example embodiment, there is provided a greedy pursuit (GP) method for joint low-rank approximation of multiple matrices via decomposing the problem into a series of rank-one approximations. At each iteration, the best rank-one approximation of the residual matrices is determined, and then the rank-one basis matrices are subtracted from the residual. An alternating optimization method is designed for the rank-one fitting. In addition, the GP method is further modified/optimized to an economic greedy pursuit (EGP) method and an orthogonal greedy pursuit (OGP) method, which further reduces the complexity and accelerates the convergence, respectively.

With reference to FIG. 1, an embodiment of the present invention is illustrated. This embodiment is arranged to provide a system for processing electronic data, such as but not limited to digital image data, textual data, audio data, and financial data. The system may perform a method for processing electronic data, comprising the steps of: transforming the electronic data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; and processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices.

Preferably, in one example, the system may be used to process digital image data which may be mixed with noise and/or defects (such as missing data from a source image). The raw data of the digital image may be represented, transformed or transcoded as matrix or matrices representing the image data for further processing. The digital data may also be digitally compressed in some examples, and the data may be further compressed or decompressed by the system.

In this embodiment, the system for processing electronic data is implemented by or for operation on a computer having an appropriate user interface. The computer may be implemented by any computing architecture, including stand-alone PC, client/server architecture, “dumb” terminal/mainframe architecture, or any other appropriate architecture. The computing device is appropriately programmed to implement the invention.

Referring to FIG. 1, there is shown a schematic diagram of a computer or a computing server 100 which in this embodiment comprises a server 100 arranged to operate, at least in part if not entirely, the system for processing electronic data with the method in accordance with one embodiment of the invention. The server 100 comprises suitable components necessary to receive, store and execute appropriate computer instructions. The components may include a processing unit 102, read-only memory (ROM) 104, random access memory (RAM) 106, and input/output devices such as disk drives 108, input devices 110 such as an Ethernet port, a USB port, etc. Display 112 such as a liquid crystal display, a light emitting display or any other suitable display and communications links 114. The server 100 includes instructions that may be included in ROM 104, RAM 106 or disk drives 108 and may be executed by the processing unit 102. There may be provided a plurality of communication links 114 which may variously connect to one or more computing devices such as a server, personal computers, terminals, wireless or handheld computing devices. At least one of a plurality of communications link may be connected to an external computing network through a telephone line or other type of communications link.

The server may include storage devices such as a disk drive 108 which may encompass solid state drives, hard disk drives, optical drives or magnetic tape drives. The server 100 may use a single disk drive or multiple disk drives. The server 100 may also have a suitable operating system 116 which resides on the disk drive or in the ROM of the server 100.

The system has a database 120 residing on a disk or other storage device which is arranged to store at least one record. The database 120 is in communication with the server 100 with an interface, which is implemented by computer software residing on the server 100. Alternatively, the database 120 may also be implemented as a stand-alone database system in communication with the server 100 via an external computing network, or other types of communication links.

In this embodiment, the system for processing electronic data may perform an approximation process including a greedy pursuit (GP) approximation process, an economic greedy pursuit (EGP) approximation process and/or an orthogonal greedy pursuit (OGP) approximation process.

Preferably, the entire process may start with formulating an initial problem according to the following steps. Given K matrices {A₁, . . . , A_(K)}∈

^(m×n), the task is to find their low-rank approximation. In one example, the following low-rank approximation of A_(k) may be used: A _(k) ≈US _(k) V ^(T) , k=1, . . . ,K  (1) where U=┌u₁, . . . , u_(m)┐∈

^(m×r), V=┌v₁, . . . , u_(n)┐∈

^(n×r), and S_(k)=diag{s_(k,l), . . . , s_(k,r)}∈

^(r×r) is a diagonal matrix with r being the target rank. Note that r>min (m,n) is allowed but generally r≤mn is required. Here, U and V are the same for all k but S_(k) can be different with each other. The columns of U and V span the r-dimensional subspaces of the column and row spaces of {A_(k)}_(k=1) ^(K).

Therefore, the low-rank approximation also achieves the task of subspace learning. According to the least squares criterion, the LRAMM method aims to solve the following minimization problem:

$\begin{matrix} {\min\limits_{U,V,{\{ S_{k}\}}_{k = 1}^{K}}{\sum\limits_{k = 1}^{K}\;{{{{{US}_{k}V^{T}} - A_{k}}}_{F}^{2}.}}} & (2) \end{matrix}$

If (U,V,S_(k)) is an optimal solution of (2), then (α₁U,α₂V,α₃S_(k)) with {α₁,α₂,α₃}∈

satisfying α₁α₂α₃=1 is also a solution. To avoid this scaling indeterminacy, the norms of {u_(i), v_(i)}_(i=1) ^(r) may be constrained to be unit, and (2) may be rewritten as

$\begin{matrix} {{\min\mspace{14mu} f\mspace{14mu}\left( {u_{i},v_{i},\left\{ s^{i} \right\}_{i = 1}^{r}} \right)\mspace{14mu}\text{:=}\mspace{14mu}{\sum\limits_{k = 1}^{K}\;{{{\sum\limits_{i = 1}^{r}\;{s_{k,i}u_{i}v_{i}^{T}}} - A_{k}}}_{F}^{2}}}{{{s.t.\mspace{14mu}{u_{i}}} - {v_{i}}} = 1},{i = 1},\ldots\;,r} & (3) \end{matrix}$ where s ^(i)=[s _(1,i) , . . . ,s _(K,i)]^(T)∈

^(K).  (4) That is, the LRAMM problem is to find U, V and {S_(k)}_(k=1) ^(K) given {A_(k)}_(k=1) ^(K) according to (3).

Preferably, data compression may be one example application of the LRAMM method: mnK real numbers are required to store the K original matrices but the memory complexity is reduced to (m+n+K)r via low-rank approximation which is only characterized by U, V and the diagonal elements of {S_(k)}_(k=1) ^(K), yielding a compression ratio of mnK/((m+n+K)r). This implies that (1) can still achieve data compression when r>min(m,n) if r<<mn.

Indeed, the compression ratio and reconstruction error decrease as r increases. Hence r>min(m,n) may be employed to attain a satisfying reconstruction error. It is worth pointing out that if the number of matrices is K=1, by the Eckart-Young theorem, the solution of (3) is the truncated SVD of A₁. That is, {s_(1,i)}_(i=1) ^(r) are the r largest singular values of A₁, and {u_(i)}_(i=1) ^(r) and {v_(i)}_(i=1) ^(r) are the corresponding left and right singular vectors, respectively. However, when the number of matrices is K>1, the truncated SVD cannot be applied to solving (3).

In some examples, PCA may be applied for low-rank representation. However, the PCA cannot directly handle multiple matrices whereas each matrix needs to be converted into a vector a_(k)=vec(A_(k))∈

^(mn). Then, the K vectors {a_(k)}_(k=1) ^(K) form the columns of the following data matrix: X=[a ₁ , . . . , a _(K)]∈

^(mn×K)  (5) whose rank satisfies rank(X)≤min(mn,K). The PCA aims to find a low-rank matrix {circumflex over (X)} to best approximate the original data matrix X:

$\begin{matrix} {\min\limits_{{{rank}{\{\hat{X})}} = r}\mspace{14mu}{{\hat{X} - X}}_{F}^{2}} & (6) \end{matrix}$ where the target rank r is usually taken as r<<min(mn,K) to achieve data compression or dimensionality reduction. Again, by the Eckart-Young theorem, the solution of (6) is given by the truncated SVD of X, which is expressed as:

$\begin{matrix} {\hat{X} = {\sum\limits_{l = 1}^{r}\;{{\sigma_{l}(X)}t_{l}y_{l}^{T}}}} & (7) \end{matrix}$ where σ_(l)(X) is the lth singular value of X while t_(l) ∈

^(mn) and y_(l) ∈

^(K) are the corresponding left and right singular vectors.

In data compression, it may be required to store the so-called principal components, that is, the largest r singular values and the corresponding singular vectors {σ_(l)(X),t_(l),y_(l)}_(l=1) ^(r), resulting in the PCA compression ratio of mnK/((mn+K+1)r). Nevertheless, there are two drawbacks of the PCA method applied to processing multiple matrices.

First, it needs to handle a matrix of a much larger size due to transforming the original matrix into a long vector. Indeed, the truncated SVD of X has a high complexity of

(max(m²n², K²)r). For high-dimensional data with mn>K, this complexity becomes

(m²n²r), which quadratically increases with mn. Second, the vectorization breaks the 2D structure and the inherent relation between rows and columns.

The 2DPCA computes a linear transformation W∈

^(n×r) with r<n, such that each matrix is projected to C_(k)=A_(k)W∈

^(m×r). The W maximizes the variance in the transformed space

$\begin{matrix} {\max\limits_{{W^{T}W} = I_{r}}\mspace{14mu}{{W^{T}\left( {\sum\limits_{k = 1}^{K}\;{\left( {A_{k} - \overset{\_}{A}} \right)^{T}\left( {A_{k} - \overset{\_}{A}} \right)}} \right)}\mspace{14mu} W}} & (8) \end{matrix}$ where Ā=(Σ_(k=1) ^(K)A_(k))/K is the mean of the matrices. The columns of the optimal W are the r eigenvectors of Σ_(k=1) ^(K) (A_(k)−AĀ)^(T) (A_(k)−Ā) corresponding to the largest r eigenvalues. The matrices can be reconstructed by Â_(k)=C_(k)W^(T). The 2DPCA needs to store {C_(k)}_(k=1) ^(K) and W, implying a compression ratio of (mnK)/((mK+n)r). As the 2DPCA only applies a single-sided transform to the matrices, its compression capability is thus limited. The computational complexity of the 2DPCA is

(mn²K).

The GLRAM method may solve the following constrained problem:

$\begin{matrix} {\min\limits_{{{U^{T}U} = {{V^{T}V} = I_{r}}},{\{ S_{k}\}}}{\sum\limits_{k = 1}^{K}\;{{{{US}_{k}V^{T}} - A_{k}}}_{F}^{2}}} & (9) \end{matrix}$ where the orthonormal constraints make it different from the LRAMM formulation (3) where U and V are not necessarily orthogonal. In addition, the matrices {S_(k)}_(k=1) ^(K) given by the GLRAM are not diagonal whereas those of the LRAM are diagonal. Hence, the compression ratio of the GLRAM is mnK/((m+n+Kr)r), which is lower for the same r. Exploiting the orthogonality of U and V, the GLRAM is reduced to

$\begin{matrix} {\max\limits_{{U^{T}U} = {{V^{T}V} = I_{r}}}{\sum\limits_{k = 1}^{K}\;{{U^{T}A_{k}V}}_{F}^{2}}} & (10) \end{matrix}$ where a two-sided transform is performed.

Preferably, the processing method in accordance with an embodiment of the present invention involves processing, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices with an approximation process, and the approximation process may include a greedy pursuit approximation process as discussed earlier.

The analysis and processing of the matrices may begin after transformation of these source electronic data is completed, the matrix representation of the electronic data may then be processed with the approximation process after a decomposition process. Preferably, the step of decomposing the matrix representation into a series of matrix approximations includes a non-orthogonal decomposition of the plurality of matrices, wherein the non-orthogonal decomposition includes a joint diagonal decomposition.

In this example, the Greedy Pursuit (GP) approximation process includes decomposing the r-term approximation into a series of rank-one approximations. At each iteration, the greedy algorithms perform rank-one approximation of the residual matrices obtained from the previous iteration. Then, the rank-one matrices are subtracted from the residuals and never revisited.

The approximation process may also include a linear regression process, which may include the step of determining a best rank-one approximation of a plurality of residual matrices in each of a plurality of iteration steps in the approximation process.

The GP approximation process for LRAMM may be performed as described in Algorithm 1. It works in an iterative fashion. Preferably, the method comprises a step of representing a solution associated with the matrix representation and the K residual matrices as (u_(i),v_(i),s^(i)) and {R_(k) ^(i)}_(k=1) ^(K) at the ith iteration step. In the ith iteration step, the GP finds the rank-one approximation of {R_(k) ^(i−1)}_(k=1) ^(K), which may be expressed as

$\begin{matrix} {{\min\mspace{14mu}{f\left( {u,v,s} \right)}\mspace{14mu}\text{:=}\mspace{14mu}{\sum\limits_{k = 1}^{K}\;{{{s_{k}{uv}^{T}} - R_{k}^{i - 1}}}_{F}^{2}}}{{s.t.\mspace{14mu}{u}} = {{v} = 1}}} & (11) \end{matrix}$ where s=[s₁, . . . , s_(K)]^(T) collects the K variables {s_(k)}_(k=1) ^(K) to be optimized. The updating equations are:

$\begin{matrix} {{\left( {u_{i},v_{i},s^{i}} \right) = {\underset{{{u} = {{v} = 1}},s}{\arg\mspace{14mu}\min}\mspace{14mu}{\sum\limits_{k = 1}^{K}\;{{{s_{k}{uv}^{T}} - R_{k}^{i - 1}}}_{F}^{2}}}}{and}} & (12) \\ {{R_{k}^{i} = {R_{k}^{i - 1} - {s_{k,i}u_{i}v_{i}^{T}}}},{k = 1},\ldots\;,{K.}} & (13) \end{matrix}$

When the target rank r is given or can be estimated, the process terminates when i>r. If the rank information is unavailable, the normalized reconstruction error (NRE), which is defined as the energy of the K residual matrices, is employed as the stopping criterion:

$\begin{matrix} {{NRE} = {\frac{\sum\limits_{k = 1}^{K}\;{R_{k}^{i}}_{F}^{2}}{\sum\limits_{k = 1}^{K}\;{R_{k}^{0}}_{F}^{2}} \leq \delta}} & (14) \end{matrix}$ where δ>0 is the tolerance. The remaining problem is to efficiently solve the rank-one approximation of multiple matrices, which is presented as follows.

Algorithm 1 GP for LRAMM Input: Matrices {A_(k)}_(k=1) ^(K) and target rank r.  Initialization: Set residual R_(k) ⁰ = A_(k) for k = 1, . . . , K.  for i = 1, 2, . . . , r do   Solve rank-one approximation     $\left( {u_{i},v_{i},s^{i}} \right) = {\underset{{{u} = {{v} = 1}},s}{\arg\;\min}{\sum\limits_{k = 1}^{K}\;{{{{s_{k}{uv}^{T}} - R_{k}^{i - 1}}}_{F}^{2}.}}}$   Update residual     R_(k) ^(i) = R_(k) ^(i−1) − s_(k,i)u_(i)v_(i) ^(T), k = 1, . . . , K.  end for Output: U = [u₁, . . . , u_(r)], V = [v₁, . . . , v_(r)], and {s^(i)}_(i=1) ^(r).

For the purpose of simplicity, the superscript (·)^(i−1) of R_(k) ^(i−1) may be omitted and (11) may be rewritten as

$\begin{matrix} {\min\limits_{{{u} = {{v} = 1}},s}{\mspace{14mu}{\sum\limits_{k = 1}^{K}\;{{{{s_{k}{uv}^{T}} - R_{k}}}_{F}^{2}.}}}} & (15) \end{matrix}$ The rank-one approximation of (15) is not easy to solve since the unit-norm constraints are nonconvex and the product term of the objective function s_(k)uv^(T) is also nonconvex. Therefore s may be emanated to obtain an optimization problem only with u and v. The kth term of (15) is computed as

$\begin{matrix} \begin{matrix} {{f_{k}\left( {u,v,s_{k}} \right)} = {{{s_{k}{uv}^{T}} - R_{k}}}_{F}^{2}} \\ {= {{s_{k}^{2}{{uv}^{T}}_{F}^{2}} - {2s_{k}{{tr}\left( {{vu}^{T}R_{k}} \right)}} + {R_{k}}_{F}^{2}}} \\ {= {s_{k}^{2} - {2\left( {u^{T}R_{k}v} \right)s_{k}} + {R_{k}}_{F}^{2}}} \end{matrix} & (16) \end{matrix}$ where ∥A∥_(F) ²=tr(A^(T)A) and tr(AB)=tr(BA) as well as the fact ∥uv^(T)∥_(F) ²=tr (vu^(T)uv^(T))=∥u∥²∥v∥²=1 may be exploited. Apparently, {s_(k)}_(k=1) ^(K) are decoupled with each other and can be solved independently. For fixed u and v, the optimal s_(k) minimizing f_(k), denoted by s_(k)*, is given by

$\begin{matrix} {\frac{\partial f_{k}}{\partial s_{k}} = {{2\left( {s_{k} - {u^{T}R_{k}v}} \right)} = {\left. 0\Rightarrow s_{k}^{*} \right. = {u^{T}R_{k}{v.}}}}} & (17) \end{matrix}$

Plugging (17) back into (16) yields the minimum value of the objective function

$\begin{matrix} {\left\{ {\min\limits_{s_{k}}\mspace{14mu}{f_{k}\left( {u,v,s_{k}} \right)}} \right\} = {{R_{k}}_{F}^{2} - {\left( {u^{T}R_{k}v} \right)^{2}.}}} & (18) \end{matrix}$ then, the following problem with s being eliminated may be obtained

$\begin{matrix} {\min\limits_{{u} = {{v} = 1}}\left\{ {{\sum\limits_{k = 1}^{K}\;{R_{k}}_{F}^{2}} - {\sum\limits_{k = 1}^{K}\;\left( {u^{T}R_{k}v} \right)^{2}}} \right\}} & (19) \end{matrix}$ where the reconstruction error Σ_(k=1) ^(K)∥R_(k)∥_(F) ² is a constant at the current iteration. It is clear that (19) amounts to

$\begin{matrix} {\max\limits_{{u} = {{v} = 1}}{\sum\limits_{k = 1}^{K}{\left( {u^{T}R_{k}v} \right)^{2}.}}} & (20) \end{matrix}$

For K=1, the optimal estimates of (20), u* and v*, are the left and right singular vectors associated with the largest singular value σ_(max)(R₁), respectively, which can be efficiently computed by the power method with a low complexity of

(mn). The corresponding maximum of the objective function is σ_(max) ²(R₁). Note that the largest singular value of a matrix is its spectral norm, i.e., σ_(max)(R₁)=∥R₁∥₂.

For K>1, an alternating maximization method may be used to solve (20), where the objective function is maximized over one vector while the other vector is fixed. To be more specific, at the (j+1)th (j=0, 1, . . . ) iteration, u and v are alternately maximized:

$\begin{matrix} {{u^{j + 1} = {\arg\;{\max\limits_{{u} = 1}{\sum\limits_{k = 1}^{K}\left( {u^{T}R_{k}v^{j}} \right)^{2}}}}}{and}} & (21) \\ {v^{j + 1} = {\arg\;{\max\limits_{{v} = 1}{\sum\limits_{k = 1}^{K}{\left( {\left( u^{j + 1} \right)^{T}R_{k}v} \right)^{2}.}}}}} & (22) \end{matrix}$ then (21) may be rewritten as

$\begin{matrix} {\max\limits_{{u} = 1}{{u^{T}\left( {\sum\limits_{k = 1}^{K}{R_{k}{v^{j}\left( {R_{k}v^{j}} \right)}^{T}}} \right)}u}} & (23) \end{matrix}$ in which an optimal solution may be the unit-norm eigenvector corresponding to the maximum eigenvalue of the positive definite matrix Σ_(k=1) ^(K)R_(k)v^(j) (R_(k)v^(j))^(T).

Similarly, the solution of (22) is the unit-norm eigenvector associated with the maximum eigenvalue of Σ_(k=1) ^(K)R_(k) ^(T)u^(j+1) (R_(k) ^(T)u^(j+1))^(T). Since (20) is nonconvex, the final convergence result relies on the initial values u⁰ and v⁰. Appropriate selection of the initial values is thus important. Suppose that the k_(m)th (k_(m) ∈{1, . . . , K}) matrix R_(k) _(m) has the maximum spectral norm. That is,

$\begin{matrix} {{R_{k_{m}}}_{2} = {\max\limits_{1 < k < K}{{R_{k}}_{2}.}}} & (24) \end{matrix}$

By using the principal singular vectors of R_(k) _(m) as the initial values: u ⁰=LSV_(max)(R _(k) _(m) ), v ⁰=RSV_(max)(R _(k) _(m) )  (25) where LSV_(max)(R_(k) _(m) ) and RSV_(max)(R_(k) _(m) ) stand for the left and right singular unit-norm vectors associated with the maximum singular value of R_(k) _(m) , respectively. Their updating equations are:

$\begin{matrix} {{u^{j + 1} = {{EV}_{{ma}\; x}\left( {\sum\limits_{k = 1}^{K}{R_{k}{v^{j}\left( {R_{k}v^{j}} \right)}^{T}}} \right)}}{and}} & (26) \\ {v^{j + 1} = {{EV}_{{ma}\; x}\left( {\sum\limits_{k = 1}^{K}{R_{k}^{T}{u^{j + 1}\left( {R_{k}^{T}v^{j + 1}} \right)}^{T}}} \right)}} & (27) \end{matrix}$ where EV_(max)(·) represents the unit-norm eigenvector associated with the maximum eigenvalue. The procedure for solving the rank-one approximation of multiple matrices of (15) is summarized in Algorithm 2.

Algorithm 2 Rank-One Approximation of Multiple Matrices Input: Matrices {R_(k)}_(k=1) ^(K).  Initialization: Determine k_(m) by       $k_{m} = {\arg\;{\max\limits_{1 \leq k \leq K}{{R_{k}}_{2}.}}}$  Set u⁰ = LSV_(max)(R_(k) _(m) ) and v⁰ = RSV_(max)(R_(k) _(m) ).  for j = 0, 1, 2, . . . do   Update u and v:      $u^{j + 1} = {{EV}_{\max}\left( {\sum\limits_{k = 1}^{K}\;{R_{k}{v^{j}\left( {R_{k}v^{j}} \right)}^{T}}} \right)}$     $v^{j + 1} = {{EV}_{\max}\left( {\sum\limits_{k = 1}^{K}\;{R_{k}^{T}{u^{j + 1}\left( {R_{k}^{T}u^{j + 1}} \right)}^{T}}} \right)}$   Stop until convergence satisfies.  end for  s_(k)* = (u^(j+1))^(T)R_(k)v^(j+1), k = 1, . . . , K. Output: u* = u^(j+1), v* = v^(j+1), and k* = [s₁ ^(k), . . . , s_(K)*]^(T).

Regarding the complexity of the GP approximation process, the initialization needs to compute the largest singular values of K matrices and the principal singular vectors of one matrix, whose complexity is

(mnK) if the power method is employed. The matrix-vector products R_(k)v^(j) and R_(k) ^(T)u^(j+1) require a cost of

(mn). The computational loads for the outer products R_(k)v^(j) (R_(k)v^(j))^(T) and R_(k) ^(T)u^(j+1) (R_(k) ^(T)u^(j+1)))^(T) are

(m²) and

(n²), respectively. Thus, forming the matrices in (26) and (27) requires

((mn+m²)K) and

((mn+n²)K) operations, respectively. Calculation of principal eigenvectors of (26) and (27) needs

(m²) and

(n²). In summary, the per-iteration complexity of rank-one approximation of Algorithm 2 is

((max(m,n))²K) and the total complexity is

((max(m,n))²K N_(iter)), where N_(iter) is the number of iterations required for convergence. Typically, several tens of iterations are enough to converge with high accuracy. Also, N_(iter) can be viewed as a constant independent of the dimension. Then, it is clear that the complexity of the GP for LRAMM of Algorithm 1 is

((max(m,n))²rK N_(iter)). When m and n have the same order-of-magnitude, say,

(m)˜

(n), it follows

((max(m,n))²)˜

(mn), resulting in a total complexity of approximately

(mnrK N_(iter)). This implies that the complexity of the GP algorithm is linear with respect to the number of elements in a matrix, mn, and the number of matrices, K. Hence, the GP algorithm is scalable to problem size.

In an alternative embodiment, the approximation process includes an economic greedy pursuit approximation process. In this embodiment, the approximation process further includes a rank-one fitting process for obtaining the low-rank approximation of the plurality of matrices, in which the low-rank approximation of the plurality of matrices are determined to be a principal singular value of each of the plurality of matrices having a maximum spectral norm.

The dominant cost of the GP method is the iterative procedure in Algorithm 2 for solving the rank-one fitting problem. To reduce the complexity, an economic version of the GP, namely, the EGP, which is listed in Algorithm 3 as shown below.

Algorithm 3 EGP for LRAMM Input: Matrices {A_(k)}_(k=1) ^(K) and target rank r.  Initialization: Set residual R_(k) ⁰ = A_(k) for k = 1, . . . , K.  for i = 1, 2, . . . , r do    ${{Determine}\mspace{14mu} k_{m}} = {\arg\;{\max\limits_{1 \leq k \leq K}{{R_{k}^{i - 1}}_{2}\mspace{14mu}{and}\mspace{14mu}{set}}}}$    u_(i) = LSV_(max)(R_(k) _(m) ^(i−1)), v_(i) = RSV_(max)(R_(k) _(m) ^(i−1)),   and s_(k,i) = u_(i) ^(T)R_(k) ^(i−1)v_(i) for k = 1, . . . , K.   Update residual     R_(k) ^(i) = R_(k) ^(i−1) − s_(k,i)u_(i)v_(i) ^(T), k = 1, . . . , K.  end for Output: U = [u₁, . . . , u_(r)], V = [v₁, . . . , v_(r)], and {s^(i)}_(i=1) ^(r).

Preferably, it only takes the initial values of (25), i.e., the principal singular value/vectors of the matrix having the maximum spectral norm, as an approximate solution to the rank-one fitting. In doing so, the iterative procedure is not involved and the algorithm complexity is reduced to

(mnr K). Note that employing the inexact solution in (25) also makes the EGP converge but its convergence rate is generally slower than that of the GP method.

In another alternative embodiment, the approximation process includes an orthogonal greedy pursuit approximation process. In this embodiment, the coefficients {s_(k,l)} of the matrix representation (u_(i),v_(i),s^(i)) is determined using a least squares method at the ith iteration step.

From Algorithm 1, it may be observed that once a rank-one solution (u_(i),v_(i),s^(i)) is obtained, it is never revisited and hence remains unchanged all the time. The OGP method may be considered as a modification to the GP method. Analogous to the orthogonal matching pursuit for sparse recovery, the OGP still keeps (u_(i),v_(i)) unchanged, however the OGP process re-computes all coefficients {s_(k,l)} by least squares, which realizes the so-called “orthogonalization” among the coefficients. To be more specific, after obtaining {u_(l)}_(l=1) ^(i) and {v_(l)}_(l=1) ^(i) at the ith iteration, {s_(k,l)} are re-computed by

$\begin{matrix} {\min\limits_{\{ s_{k,l}\}}{\sum\limits_{k = 1}^{K}{{{\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}} - A_{k}}}_{F}^{2}}} & (28) \end{matrix}$ which can be decomposed into the following K independent minimization problems:

$\begin{matrix} {\min\limits_{s_{k,1},\ldots,s_{k,i}}{{{\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}} - A_{k}}}_{F}^{2}} & (29) \end{matrix}$ for k=1, . . . , K, because {s_(k,l)} can be separately solved with respect to k. the vector is further defined as s _(k)=[s _(k,l) , . . . ,s _(k,i)]^(T)∈

^(i)  (30) which should be distinguished from s^(i) ∈

^(K) in (4). Obviously, if r=1, the OGP is the same as the GP because both of them are just a rank-one approximation problem. Therefore, only case of r>2 is considered for the OGP.

To derive the solution of (28). Recalling a_(k)=vec(A_(k))∈

^(mn) and defining b_(l)=vec(u_(l)v_(l))∈

^(mn), (29) amounts to

$\begin{matrix} {{\min\limits_{s_{k}}\left\{ {{{{\sum\limits_{l = 1}^{i}{s_{k,l}b_{l}}} - a_{k}}}^{2} = {{{B_{i}s_{k}} - a_{k}}}^{2}} \right\}}{where}} & (31) \\ {B_{i} = {\left\lbrack {b_{1},\ldots\mspace{14mu},b_{i}} \right\rbrack \in {{\mathbb{R}}^{{mn} \times i}.}}} & (32) \end{matrix}$

It is worth pointing out that there is no need to construct the matrices in practice and it is only required to use the vectors for derivation. Note that the column number of B_(i) is the iteration number i and varies during the iterative process. The solution of (31) is s _(k)=(B _(i) ^(T) B _(i))⁻¹ B _(i) ^(T) a _(k).  (33)

However, s_(k) is not processed by the direct use of (33) since it involves matrix multiplication and inversion, which is computationally expensive. Instead, a recursive calculation is exploited to reduce the complexity. It is clear that B_(i) ^(T)a_(k) can be recursively calculated as

$\begin{matrix} {c_{k}^{i}\overset{\Delta}{=}{{B_{i}^{T}a_{k}} = {\begin{bmatrix} {B_{i - 1}^{T}a_{k}} \\ {b_{i}^{T}a_{k}} \end{bmatrix} = \begin{bmatrix} c_{k}^{i - 1} \\ {u_{i}^{T}A_{k}v_{i}} \end{bmatrix}}}} & (34) \end{matrix}$ where c_(k) ^(i−1)=B_(i−1) ^(T)a_(k) ∈

⁻¹ is the result of the (i−1) th iteration, and is employed in the current iteration. At the beginning, c_(k) ⁰=∅ is set. Note that b_(i) ^(T)a_(k) can be computed as

$\begin{matrix} \begin{matrix} {{b_{i}^{T}a_{k}} = {\left\langle {b_{i},a_{k}} \right\rangle = \left\langle {{u_{i}v_{i}^{T}},A_{k}} \right\rangle}} \\ {= {{{tr}\left( {v_{i}u_{i}^{T}A_{k}} \right)} = {u_{i}^{T}A_{k}v_{i}}}} \end{matrix} & (35) \end{matrix}$ where

vec(A),vec(B)

=

A, B

satisfies for two arbitrary matrices A and B. When c_(k) ^(i−1) is available, the cost to obtain c_(k) ^(i) is computing b_(i) ^(T)a_(k), which requires

(mn) operations.

Next, the recursive computation of (B_(i) ^(T)B_(i))⁻¹ is considered. Obviously, B_(i) ^(T)B_(i) is the Gram matrix of the vectors {b₁, . . . , b_(i)} and it may be denoted as G_(i)=B_(i) ^(T)B_(i)∈

^(i×i). The G_(i) and G_(i−1) are related via

$\begin{matrix} {\begin{matrix} {G_{i} = {{B_{i}^{T}B_{i}} = {\begin{bmatrix} B_{i - 1}^{T} \\ b_{i}^{T} \end{bmatrix}\left\lbrack {B_{i - 1},b_{i}} \right\rbrack}}} \\ {= \begin{bmatrix} {B_{i - 1}^{T}B_{i - 1}} & {B_{i - 1}^{T}b_{i}} \\ {b_{i}^{T}B_{i - 1}} & {b_{i}^{T}b_{i}} \end{bmatrix}} \\ {= \begin{bmatrix} G_{i - 1} & g_{i - 1} \\ g_{i - 1}^{T} & 1 \end{bmatrix}} \end{matrix}{where}} & (36) \\ {{{g_{i - 1} - {B_{i - 1}^{T}b_{i}}} \Subset {\mathbb{R}}^{i - 1}}{and}} & (37) \\ {{b_{i}^{T}b_{i}} = {\left\langle {b_{i},b_{i}} \right\rangle = {\left\langle {{u_{i}v_{i}^{T}},{u_{i}v_{i}^{T}}} \right\rangle = {{{{tr}\left( {v_{i}u_{i}^{T}u_{i}v_{i}^{T}} \right)} - {{u_{i}}^{2}{v_{i}}^{2}}} = 1}}}} & (38) \end{matrix}$ are applied. Let

$\begin{matrix} {{h_{i - 1} = {G_{i - 1}^{- 1}g_{i - 1}}},{\beta = \frac{1}{1 - {g_{i - 1}^{T}h_{i - 1}}}}} & (39) \end{matrix}$ the inverse of G_(i) may be obtained as

$\begin{matrix} {G_{i}^{- 1} = \begin{bmatrix} {G_{i - 1}^{- 1} + {\beta\; h_{i - 1}h_{i - 1}^{T}}} & {{- \beta}\; h_{i - 1}} \\ {{- \beta}\; h_{i - 1}^{T}} & \beta \end{bmatrix}} & (40) \end{matrix}$ with the use of the block matrix inversion formula. Again, the result of the (i−1) th iteration G_(i−1) ⁻¹ is already available for the current iteration. The initial value is set as G₀ ⁻¹=0 while G₁ ⁻¹=1 at the first iteration. For i≥2, G_(i) ⁻¹ is recursively calculated by (40). It is only necessary to compute whose g_(i−1), whose pth (p=1, . . . , i−1) entry is

$\begin{matrix} \begin{matrix} {\left\lbrack g_{i - 1} \right\rbrack_{p} = {{b_{p}^{T}b_{i}} = {\left\langle {b_{p},b_{i}} \right\rangle = \left\langle {{u_{p}v_{p}^{T}},{u_{i}v_{i}^{T}}} \right\rangle}}} \\ {= {{{tr}\left( {v_{p}u_{p}^{T}u_{i}v_{i}^{T}} \right)} = {\left( {u_{p}^{T}u_{i}} \right)\left( {v_{p}^{T}v_{i}} \right)}}} \end{matrix} & (41) \end{matrix}$ which only requires a complexity of

(m+n) rather than

(mn). Then, g _(i−1)=[(u ₁ ^(T) u _(i))(v ₁ ^(T) v _(i)), . . . ,(u _(i−1) ^(T) u _(i))(v _(i−1) ^(T) v _(i))]^(T).  (42)

Computing g_(i−1) costs

(i(m+n)), It is either lower than or similar to the computational load of b_(i) ^(T)a_(k) in (35), namely,

(mn) due to i<r and r˜max(m,n) in general. Similar to (13), the residual is updated as

$\begin{matrix} {{R_{k}^{i} = {A_{k} - {\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}}}},{k = 1},\ldots\mspace{14mu},{K.}} & (43) \end{matrix}$

In summary, the dominant computational load for re-computation of the coefficients in the OGP is calculating {b_(i) ^(T)a_(k)}_(k=1) ^(K), which is

(mn K). The OGP for LRAMM in summarized in Algorithm 4 as follows.

Algorithm 4 OGP for LRAMM Input: Matrices {A_(k)}_(k=1) ^(K) and target rank r.  Initialization: Set residual R_(k) ⁰ = A_(k) for k =1, . . . , K.  For i = 1, use Algorithm 1 to obtain (u₁, v₁) and {R_(k) ¹}_(k=1) ^(K).  Set G₁ ⁻¹ = 1 and c_(k) ¹ = u_(i) ^(T) A_(k)v_(i) for k = 1, . . . , K.  for i = 2, 3, . . . , r do   Solve rank-one approximation     $\left( {u_{i},v_{i}} \right) = {\underset{{{u} = {{v} = 1}},s}{\arg\;\min}{\sum\limits_{k = 1}^{K}\;{{{{s_{k}{uv}^{T}} - R_{k}^{i - 1}}}_{F}^{2}.}}}$   Compute    g_(i−1) = [(u₁ ^(T)u_(i)) (v₁ ^(T)v_(i)), . . . , (u_(i−1) ^(T)u_(i)) (v_(i−1) ^(T)v_(i))]^(T)    h_(i−1) = G_(i−1) ⁻¹g_(i−1)    β = 1/(1 − g_(i−1) ^(T)h_(i−1))     $G_{i}^{- 1} = \begin{bmatrix} {G_{i - 1}^{- 1} + {\beta\; h_{i - 1}h_{i - 1}^{T}}} & {{- \beta}\; h_{i - 1}} \\ {{- \beta}\; h_{i - 1}^{T}} & \beta \end{bmatrix}$     ${c_{k}^{i} = \begin{bmatrix} c_{k}^{i - 1} \\ {u_{i}^{T}A_{k}v_{i}} \end{bmatrix}},{k = 1},\ldots\mspace{14mu},{K.}$    [s_(k,1), . . . , s_(k,i)]^(T) = G_(i) ⁻¹c_(k) ^(i), k = 1, . . . , K.   Calculate residual      ${R_{k}^{i} = {A_{k} - {\sum\limits_{l = 1}^{i}\;{s_{k,l}u_{l}v_{l}^{T}}}}},{k = 1},\ldots\mspace{14mu},{K.}$  end for Output: U = [u₁, . . . , u_(r)], V = [v₁, . . . , v_(r)], and {s_(k,l)} with 1 ≤ k ≤ K, 1 ≤ l ≤ r.

The inventors has compared the performances including compression ratios and computational complexities of the PCA, 2DPCA, GLRAM, GP/OGP and EGP in Table 1.

TABLE I Compression ratio and computational complexity Method Compression ratio Complexity PCA $\frac{mnK}{\left( {{mn} + K + 1} \right)r}$

(max(m²n², K²)r) 2DPCA $\frac{mnK}{\left( {{mK} + n} \right)r}$

(mn²K) GLRAM $\frac{mnK}{\left( {m + n + {Kr}} \right)r}$

((m + n)²rKN_(iter)) GP/OGP $\frac{mnK}{\left( {m + n + K} \right)r}$

((max(m, n))²rKN_(iter)) EGP $\frac{mnK}{\left( {m + n + K} \right)r}$

(mnrK)

Optionally or additionally, the method for processing of electronic data, such as digital image data, may further include a feature extraction process for classification of the electronic data.

In addition to the direct application to data compression, the GP algorithms can also be applied to feature extraction for classification. Suppose a set of basis matrices {u_(i)v_(i) ^(T)}_(i=1) ^(r) has been from the training data {A_(k)}_(k=1) ^(K) using the GP, EGP or OGP. In the training stage, the class label information of the training data is not used. Thus, the GP belongs to unsupervised learning.

The tested matrix Z is deemed to have the similar low-rank structure as the training matrices {A_(k)}_(k=1) ^(K). The original matrices are not handled any more but the following reconstructions may be used instead

$\begin{matrix} {{{\hat{A}}_{k} = {\sum\limits_{i = 1}^{r}\;{s_{k,i}u_{i}v_{i}^{T}}}},{\hat{Z} = {\sum\limits_{i = 1}^{r}\;{s_{z,i}u_{i}v_{i}^{T}}}}} & (44) \end{matrix}$ where {s_(z,i)} is the solution of min ∥Σ_(i=1) ^(r)s_(i) ^(z)u_(i)v_(i) ^(T)−Z∥_(F) ². Similar to (33), this solution is expressed as s _(r) =G _(r) ⁻¹ B _(r) ^(T) z  (45) where s_(z)=[s_(z,1), . . . , s_(z,r)]^(T) collects the r coefficients, z=vec(Z), B_(r) has the same form as (32) with i=r, and G_(r)=B_(r)B_(r) ^(T) is the Gram matrix of the basis matrices. The distance between {circumflex over (Z)} and Â_(k) is taken as the similarity measure after rank reduction, which is computed as

$\begin{matrix} {{{{\hat{Z} \cdot {\hat{A}}_{k}}}_{F}^{2} - {{{vec}\left( {\hat{Z} - {\hat{A}}_{k}} \right)}}^{2}} = {{{B_{r}\left( {s_{z} - s_{k}} \right)}}^{2} = {{\left( {s_{z} - s_{k}} \right)^{T}{G_{r}\left( {s_{z} - s_{k}} \right)}} = {{G_{r}^{\frac{1}{2}}\left( {s_{z} - s_{k}} \right)}}^{2}}}} & (46) \end{matrix}$ where

$G_{r}^{\frac{1}{2}} \in {\mathbb{R}}^{r \times r}$ is the square root matrix of G_(r). Thus,

${G_{r}^{\frac{1}{2}}s_{z}} \in {\mathbb{R}}^{r}$ can be viewed as the “feature vector” of Z extracted by the greedy algorithm. Since s_(k)=G_(r) ⁻¹B_(r) ^(T)a_(k) for the OGP, (46) is further simplified to ∥{circumflex over (Z)}−Â _(k)∥_(F) ²=(d _(z) −d _(k))^(T) G _(r) ⁻¹(d _(z) −d _(k))  (47) where d _(z) =B _(r) ^(T) z=[u ₁ ^(T) Zv ₁ , . . . ,u _(r) ^(T) Zv _(r)]^(T).  (48)

The d_(k) has similar expression as d_(z). Noting that the OGP already outputs G_(r) ⁻¹, only d_(z) and d_(k) need to calculate. The feature vector of Z extracted by the OGP can also be expressed as

$G_{r}^{- \frac{1}{2}}{d_{z}.}$ In the test stage, the nearest neighbor (NN) classifier is employed. The tested sample is assigned to the class of its closest neighbor that has the minimum distance of the reconstructed matrices {circumflex over (Z)} and Â_(k). However, it is not necessary to perform reconstruction in practice. Instead, the distance can be efficiently computed according to (46) or (47), where only r-dimensional vectors are involved.

The following lemma which facilitates the convergence analysis has been proved.

Lemma 1: At the ith iteration, when applying Algorithm 2 to the rank-one approximation of matrices {R_(k) ^(i−1)}_(k=1) ^(K), the objective function of the subproblem (20) at the solution (u_(i),v_(i)) is guaranteed

$\begin{matrix} {{\sum\limits_{k = 1}^{K}\left( {u_{i}^{T}R_{k}^{i - 1}v_{i}} \right)^{2}} \geq {\frac{1}{{\min\left( {m,n} \right)}K}{\sum\limits_{k = 1}^{K}{{R_{k}^{i - 1}}_{F}^{2}.}}}} & (49) \end{matrix}$ where (u_(i),v_(i)) denotes the solution of the rank-one approximation at the ith iteration, as well as the solution or subproblem (20) given by Algorithm 2. As Algorithm 2 adopts the manner of alternating maximization, it non-decreases the objective function and indicates that the objective function at (u_(i),v_(i)) is no less than that at the initial value (u⁰,v⁰) and therefore

$\begin{matrix} \begin{matrix} {{\sum\limits_{k = 1}^{K}\left( {u_{i}^{T}R_{k}^{i - 1}v_{i}} \right)^{2}} \geq {\sum\limits_{k = 1}^{K}\left( {\left( u^{0} \right)^{T}R_{k}^{i - 1}v^{0}} \right)^{2}}} \\ {\geq {\max\limits_{1 \leq k \leq K}\left( {\left( u^{0} \right)^{T}R_{k}^{i - 1}v^{0}} \right)^{2}}} \\ {= {\left( {\left( u^{0} \right)^{T}R_{k_{m}}^{i - 1}v^{0}} \right)^{2} = {\sigma_{m\;{ax}}^{2}\left( R_{k_{m}}^{i - 1} \right)}}} \\ {\geq {\frac{1}{K}{\sum\limits_{k = 1}^{K}{\sigma_{m\;{ax}}^{2}\left( R_{k}^{i - 1} \right)}}}} \\ {\geq {\frac{1}{K}{\sum\limits_{k = 1}^{K}{\frac{1}{{rank}\left( R_{k}^{i - 1} \right)}{\sum\limits_{l = 1}^{{rank}{(R_{k}^{i - 1})}}\;{\sigma_{l}^{2}\left( R_{k}^{i - 1} \right)}}}}}} \\ {= {\frac{1}{K}{\sum\limits_{k = 1}^{K}{\frac{1}{{rank}\left( R_{k}^{i - 1} \right)}{R_{k}^{i - 1}}_{F}^{2}}}}} \\ {\geq {\frac{1}{{\min\left( {m,n} \right)}K}{\sum\limits_{k = 1}^{K}{R_{k}^{i - 1}}_{F}^{2}}}} \end{matrix} & (50) \end{matrix}$ where the equation in the third line exploits that the initial values u⁰ and v⁰ are the principal singular vectors of R_(k) _(m) ^(i−1) and the inequality at the fourth line follows from that R_(k) _(m) ^(i−1) has the maximum spectral norm, i.e.,

${\sigma_{m\;{ax}}^{2}\left( R_{k_{m}}^{i - 1} \right)} = {\underset{k}{\max\;}{{\sigma_{m\;{ax}}^{2}\left( R_{k}^{i - 1} \right)}.}}$ Also, the equation in the sixth line is based on the fact the square of the Frobenius norm of a matrix equals the sum of the squared singular values and the last inequality is due to rank(R_(k) ^(i−1))≤min(m,n).

The following guarantees the convergence of the GP for LRAMM and gives a lower bound on the convergence rate.

The reconstruction error, i.e., the energy of the residual matrices of the GP for LRAMM in Algorithm 1 decays exponentially:

$\begin{matrix} {{\sum\limits_{k = 1}^{K}\;{R_{k}^{i}}_{F}^{2}} \leq {\gamma_{GP}^{i}{\sum\limits_{k = 1}^{K}{A_{k}}_{F}^{2}}}} & (51) \end{matrix}$ for the iteration number i=0, 1, 2, . . . , where

$\begin{matrix} {\gamma_{GP} = {1 - \frac{1}{{\min\left( {m,n} \right)}K}}} & (52) \end{matrix}$ satisfying 0<γ_(GP)<1 is a lower bound of the convergence rate. Note that in the optimization literature, exponential convergence also refers to geometric convergence or linear convergence.

Starting with the residual update formula in (13), it follows that

$\begin{matrix} \begin{matrix} {{\sum\limits_{k = 1}^{K}\;{R_{k}^{i}}_{F}^{2}} = {\sum\limits_{k = 1}^{K}{{R_{k}^{i - 1} - {s_{k,i}u_{i}v_{i}^{T}}}}_{F}^{2}}} \\ {= {\min\limits_{{{u} = {{v} = 1}},s}{\sum\limits_{k = 1}^{K}{{{s_{k}{uv}^{T}} - R_{k}^{i - 1}}}_{F}^{2}}}} \\ {= {\min\limits_{{u} = {{v} = 1}}\left\{ {{\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}} - {\underset{k = 1}{\overset{K}{\;\sum}}\left( {u^{T}R_{k}^{i - 1}v} \right)^{2}}} \right\}}} \\ {= {{\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}} - {\min\limits_{{u} = {{v} = 1}}{\underset{k = 1}{\overset{K}{\;\sum}}\left( {u^{T}R_{k}^{i - 1}v} \right)^{2}}}}} \\ {= {{\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}} - {\underset{k = 1}{\overset{K}{\;\sum}}\left( {u_{i}^{T}R_{k}^{i - 1}v_{i}} \right)^{2}}}} \\ {\leq {{\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}} - {\frac{1}{{\min\left( {m,n} \right)}K}\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}}}} \\ {= {\left( {1 - \frac{1}{{\min\left( {m,n} \right)}K}} \right)\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}}} \\ {= {\gamma_{GP}\underset{k = 1}{\overset{K}{\;\sum}}{R_{k}^{i - 1}}_{F}^{2}}} \end{matrix} & (53) \end{matrix}$ where the equations in the second and fifth lines follow from (12), i.e., (u_(i),v_(i),s^(i)) is the minimizer of the rank-one approximation or equivalently is the maximizer of (20) at the ith iteration. Meanwhile, the third equation is a reduced result of the second line with s being eliminated, which follows from (19). The inequality in (53) is due to Lemma 1. Successively applying the above relation, the following relation may be obtained:

$\begin{matrix} {{{\sum\limits_{k = 1}^{K}\;{R_{k}^{i}}_{F}^{2}} \leq {\gamma_{GP}^{i}{\sum\limits_{k = 1}^{K}{R_{k}^{0}}_{F}^{2}}}} = {\gamma_{GP}^{i}{\sum\limits_{k = 1}^{K}{A_{k}}_{F}^{2}}}} & (54) \end{matrix}$ where R_(k) ⁰=A_(k) for k=1, . . . , K may be used. Since the decay ratio satisfies 0<γ_(GP)<1, the reconstruction error strictly decreases at each iteration and the GP algorithm converges with a worst decay rate of γ_(GP).

It is apparent that the reconstruction error approaches zero:

$\begin{matrix} {{\lim\limits_{i->\infty}{\sum\limits_{k = 1}^{K}\;{R_{k}^{i}}_{F}^{2}}} = 0} & (55) \end{matrix}$

due to γ_(GP) ∈(0, 1). This implies that the stopping criterion in (14) is well defined for any δ>0. Obviously, (55) also indicates

$\begin{matrix} {{{\lim\limits_{i->\infty}{R_{k}^{i}}_{F}^{2}} = 0},{{{and}\mspace{14mu}{\lim\limits_{i->\infty}\mspace{14mu} R_{k}^{i}}} = 0},{k = 1},\ldots\mspace{14mu},{K.}} & (56) \end{matrix}$

As discussed above, the following corollary allows an infinite series expansion for an arbitrary set of matrices {A_(k)}_(k=1) ^(K).

For any matrix set {A_(k)}_(k=1) ^(K), the GP algorithm leads to an infinite series expansion, which is shown as

$\begin{matrix} {{A_{k} = {\sum\limits_{i = 1}^{\infty}\;{s_{k,i}u_{i}v_{i}^{T}}}},{k = 1},\ldots\mspace{14mu},K} & (57) \end{matrix}$ where (u_(i),v_(i),s^(i)) is the result obtained by Algorithm 1 at the ith iteration.

Successive application of the residual update formula of (13) results in

$\begin{matrix} {\begin{matrix} {R_{k}^{i} = {R_{k}^{i - 1} - {s_{k,i}u_{i}v_{i}^{T}}}} \\ {= {R_{k}^{0} - {\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}}}} \\ {= {A_{k} - {\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}}}} \end{matrix}{or}} & (58) \\ {A_{k} - {\sum\limits_{l = 1}^{i}{s_{k,l}u_{l}v_{l}^{T}}} + {R_{k}^{i}.}} & (59) \end{matrix}$

Exploiting

${\lim\limits_{i\rightarrow\infty}R_{k}^{i}} = 0$ in (56) and taking limits as i→∞ on both sides of (59) yields (57).

Preferably, the EGP has the same convergence result. To prove the convergence of the EGP, it is necessary to replace “≥” by “=” in the first line of (50), while other steps remain unchanged as the GP. It should be pointed out that the GP and EGP merely have the worst-case lower bounds of the convergence rates. Their practical convergence rates are generally different. The worst case refers to that there is no improvement using the iterative procedure in Algorithm 2 compared with merely using the initial value of (25). This case happens when v^(u) is in the intersection of the null spaces of {R_(k) ^(i−1)}_(k≠k) _(m) , resulting in R_(k) ⁻¹v⁰=0 for k≠k_(m).

However, if {R_(k) ^(i−1)}_(k) have similar low-rank structure, the principal singular vector of one matrix lies in all null spaces of other matrices does not happen. That is, the iterative procedure in Algorithm 2 will improve the rank-one solution, which makes the GP generally converge faster than the EGP.

It is clear that the convergence of the OGP is guaranteed since its reconstruction error decreases faster than that of the GP due to the re-minimization with respect to the weight coefficients of (28) at each iteration. This means that the convergence rate of the OGP is faster than the GP. Theorem 2 further states how much the OGP is faster than the GP, where the acceleration factor is quantitatively given.

The reconstruction error of the OGP for LRAMM in Algorithm 4 decays exponentially

$\begin{matrix} {{\sum\limits_{k = 1}^{K}{R_{k}^{i}}_{F}^{2}} \leq {\gamma_{OGP}^{i - 1}\gamma_{GP}{\sum\limits_{k = 1}^{K}{A_{k}}_{F}^{2}}}} & (60) \end{matrix}$ for the iteration number i=0, 1, 2, . . . , where a lower bound of the convergence rate

$\begin{matrix} {\gamma_{OGP} = {1 - \frac{\rho}{{\min\left( {m,n} \right)}K}}} & (61) \end{matrix}$ satisfies 0<γ_(OGP)<1 with ρ>1 being the acceleration factor. Also, it follows that γ_(OGP)<γ_(GP) due to ρ>1 and hence the OGP converges faster than the GP.

The following lemma is also proved.

Lemma 2: The squared Frobenius norms of R_(k) ^(i) and R_(k) ^(i−1) for i=1, 2, . . . , are linked by ∥R _(k) ^(i)∥_(F) ² =∥R _(k) ^(i−1)∥_(F) ²−ρ_(i)(u _(i) ^(T) R _(k) ^(i−1) v _(i))²  (62) where ρ₁=1 and for i≥2

$\begin{matrix} {\rho_{i} = {\frac{1}{\sin^{2}\theta_{i}} > 1}} & (63) \end{matrix}$ with θ_(i) being the angle between vec(u_(i)v_(i) ^(T)) and the subspace spanned by {vec(u_(l)v_(l) ^(T)}_(l=1) ^(i−1).

At the first iteration or i=1, the OGP obtains the same result as the GP. By (18), ρ₁=1 is determined. The case of i≥2 is then considered. Denoting r_(k) ^(i)−vec(R_(k) ^(i)) and r_(k) ^(i−1)=vec(R_(k) ^(i−1)), ∥R_(k) ^(i)∥_(F) ²=r_(k) ^(i)∥² and ∥R_(k) ^(i−1)∥_(F) ²=r_(k) ^(i−1)∥² are obtained. The relation between ∥r_(k) ^(i)∥² and ∥r_(k) ^(i−1)∥² may be further derived. Recalling r_(k) ^(i)=B_(i)s^(i)−a_(k), (33) it follows that r _(k) ^(i)=(Π_(i) −I)a _(k)=−Π_(i) ^(⊥) a _(k)  (64) where Π_(i) B _(i)(B _(i) ^(T) B _(i))⁻¹ B _(i) ^(T)∈

^(mn×mn)  (65) is the orthogonal projection matrix onto range(B_(i)), i.e., the column space of B_(i), while Π_(i) ^(⊥) =i−Π _(i)  (66) is the projection matrix onto the complementary subspace of range(D_(i)), i.e., range(B_(i) ^(⊥)).

Then, ∥r _(k) ^(i)∥²=∥Π_(i) ^(⊥) a _(k)∥² =a _(k) ^(T)(Π_(i) ^(⊥))^(T)Π_(i) ^(⊥) a _(k) a _(k) ^(T)Π_(i) ^(⊥) a _(k)  (67) where the projection matrix is symmetric and idempotent, i.e., Π_(i) ^(⊥)=(Π_(i) ^(⊥))^(T)=(Π_(i) ^(⊥))². Similarly, ∥r_(k) ^(i−1)∥²=a_(k) ^(T)Π_(i−1) ^(⊥)a_(k) where Π_(i−1) =B _(i−1)(B _(i−1) ^(T) B _(i−1))⁻¹ B _(i−1) ^(T).  (68)

Plugging B_(i)=[B_(i−1), b_(i)] into (65) and using the block matrix inversion formula with tedious but straightforward derivations,

$\begin{matrix} {\prod_{i}{- {\prod_{i - 1}{{+ \rho_{i}}{\prod_{i - 1}^{\bot}{b_{i}b_{i}^{T}{\prod_{i - 1}^{\bot}\mspace{14mu}{and}}}}}}}} & (69) \\ {\prod_{i}^{\bot}{= {\prod_{i - 1}^{\bot}{{- \rho_{i}}{\prod_{i - 1}^{\bot}{b_{i}b_{i}^{T}{\prod_{i - 1}^{\bot}\mspace{14mu}{where}}}}}}}} & (70) \\ {\rho_{i} = {\frac{1}{b_{i}^{T}{\prod_{i - 1}^{\bot}b_{i}}}.}} & (71) \end{matrix}$ To prove ρ_(i)>1, denoting the projection onto a convex set C as Π_(c)(·). The non-expansiveness states that ∥Π_(c)(b)∥≤b∥ is true for any vector b. Since range(B_(i−1)) is a subspace and also a convex set, the projections Π_(i−1) and Π_(i−1) ^(⊥) are non-expansive. This elicits b _(i) ^(T)Π_(i−1) ^(⊥) b _(i)=∥Π_(i−1) ^(⊥) b _(i)∥² ≤∥b _(i)∥²=1  (72) where ∥b_(i)∥²=1 is due to (38). Only when b_(i) is orthogonal to range(B_(i)) or

b_(i), b_(l)

=0 for all l=1, . . . , i−1, ∥Π_(i−1)b_(i)∥²=∥b_(i)∥² happens and ρ_(i)=1 in this case. Since

b _(i) ,b _(l)

=

u _(i) v _(i) ^(T) ,u _(l) v _(l) ^(T) =

u _(i) ,u _(l)

v _(i) ,v _(l)

_(,)  (73)

b_(i),b_(l)

=0 implies

u_(i),v_(l)

=0 or

v_(i),v_(l)

=0 for all l=1, . . . , i−1. However, unlike the orthogonal requirement in the GLRAM, orthogonalization is not performed to the columns of U or V. For random matrices or matrices containing random components, the probability of

v_(i),v_(l)

=0 or

v_(i),v_(l)

=0 is zero. Hence, ∥∅_(i−1) ^(⊥)b_(i)∥²<∥b_(i)∥² and ρ_(i)>1 are determined in general. The cosine of the angle between b_(i) and the subspace range(B_(i−1) ^(⊥)) is

$\begin{matrix} {{\cos\;\phi_{i}} = {\frac{\left\langle {b_{i},{\prod_{i - 1}^{\bot}b_{i}}} \right\rangle}{{b_{i}}{{\prod_{i - 1}^{\bot}b_{i}}}} = {{{\prod_{i - 1}^{\bot}b_{i}}} = \frac{1}{\sqrt{\rho_{i}}}}}} & (74) \end{matrix}$ where

b_(i),Π_(i−1) ^(⊥)h_(i)

=b_(i) ^(T)Π_(i−1) ^(⊥)b_(i)=∥Π_(i−1) ^(⊥)b_(i)∥² and ∥b_(i)∥=1 are used. Hence,

$\begin{matrix} {\rho_{i} = {\frac{1}{\cos^{2}\phi_{i}} = \frac{1}{\sin^{2}\theta_{i}}}} & (75) \end{matrix}$ where θ_(i) is the angle between b_(i) and the subspace range(B_(i−1)) and θ_(i)+ϕ=π/2 because range(B_(i−1) ^(⊥))) is the orthogonal compliment of range(B_(i−1))

It follows from (70) that a _(k) ^(T)Π_(i) ^(⊥) a _(k) =a _(k) ^(T)Π_(i−1) ^(⊥) a _(k)−ρ_(i) a _(k) ^(T)Π_(i−1) ^(⊥) b _(i) b _(i) ^(T)Π_(i−1) ^(⊥) a _(k).   (76)

Substituting (64) and (67) into (76) yields ∥r _(k) ^(i)∥² =∥r _(k) ^(i 1)∥²−ρ_(i)(b _(i) ^(T) r _(k) ^(i−1))².  (77)

Since

$\begin{matrix} \begin{matrix} {{b_{i}^{T}r_{k}^{i - 1}} = {\left\langle {b_{i},r_{k}^{i - 1}} \right\rangle = \left\langle {{u_{i}v_{i}^{T}},R_{k}^{i - 1}} \right\rangle}} \\ {= {{{tr}\left( {v_{i}u_{i}^{T}R_{k}^{i - 1}} \right)} = {u_{i}^{T}R_{k}^{i - 1}v_{i}}}} \end{matrix} & (78) \end{matrix}$ (77) in the vector form is equivalent to the matrix form of (62), which completes the proof.

Summing (62) in Lemma 2 from k=1 to K and using Lemma 1 leads to

$\begin{matrix} \begin{matrix} {{\sum\limits_{k = 1}^{K}{R_{k}^{i}}_{F}^{2}} = {{{\sum\limits_{k = 1}^{K}{R_{k}^{i - 1}}_{F}^{2}} - {\rho_{i}{\sum\limits_{k = 1}^{K}\left( {u_{i}^{T}R_{k}^{i - 1}v_{i}} \right)^{2}}}} \leq}} \\ {{\sum\limits_{k = 1}^{K}{R_{k}^{i - 1}}_{F}^{2}} - {\frac{\rho_{i}}{{\min\left( {m,n} \right)}K}{\sum\limits_{k = 1}^{K}{R_{k}^{i - 1}}_{F}^{2}}}} \\ {= {\left( {1 - \frac{\rho_{i}}{{\min\left( {m,n} \right)}K}} \right){\sum\limits_{k = 1}^{K}{R_{k}^{i - 1}}_{F}^{2}}}} \end{matrix} & (79) \end{matrix}$ which holds true for i, i−1, . . . , 2. Successively applying (79) results in

$\begin{matrix} {{{\sum\limits_{k = 1}^{K}{R_{k}^{i}}_{F}^{2}} \leq {\prod\limits_{l = 2}^{i}\;{\left( {1 - \frac{\rho_{l}}{{\min\left( {m,n} \right)}K}} \right){\sum\limits_{k = 1}^{K}{R_{k}^{1}}_{F}^{2}}}} \leq {\left( {1 - \frac{\rho}{{\min\left( {m,n} \right)}K}} \right)^{i - 1}{\sum\limits_{k = 1}^{K}{R_{k}^{1}}_{F}^{2}}}} = {\gamma_{OGP}^{i - 1}{\sum\limits_{k = 1}^{K}{R_{k}^{1}}_{F}^{2}}}} & (80) \end{matrix}$ where ρ−min{ρ₂, . . . , ρ_(i)} and ρ>1. For i=1, the OGP is the same as the GP and thus the reconstruction error obeys

$\begin{matrix} {{\sum\limits_{k = 1}^{K}{R_{k}^{1}}_{F}^{2}} \leq {\gamma_{GP}{\sum\limits_{k = 1}^{K}{{A_{k}}_{F}^{2}.}}}} & (81) \end{matrix}$

Combining (80) and (81) yields (60).

From the second iteration (i≥2), the OGP converges faster than the GP because of γ_(OGP)<γ_(GP). The acceleration ratio depends on ρ. The larger the value of ρ, the faster the OGP is. Furthermore, the OGP has finite convergence property, as stated in the following theorem.

The current residual matrices {R_(k) ^(i)}_(k=1) ^(K) generated by the OGP are orthogonal to all rank-one matrices {u_(l)v_(l) ^(T)}_(l=1) ^(i) that have been chosen. These selected rank-one matrices are linearly independent with each other. The reconstruction error of the OGP will be zero after at most mn iterations.

Two matrices A and B are orthogonal if their inner product

A, B

=0, or equivalently,

vec(A),vec(B)

=0. According to (64), it follows that B _(i) ^(T) r _(k) ^(i) =−B _(i) ^(T)Π_(i) ^(⊥) a _(k)=0, k=1, . . . ,K  (82) where B_(i) ^(T)Π_(i) ^(⊥)=0 is used, which is obviously true because Π_(i) ^(⊥) is the projection onto the orthogonal complement of range(B_(i)). Then, (82) amounts to

b_(l), r_(k) ^(i)

=0 or

u_(l)v_(l) ^(T), R_(k) ^(i)

=0, for l=1, . . . , i. This completes the proof of {R_(k) ^(i)}_(k=1) ^(K) being orthogonal to {u_(l)v_(l) ^(T)}_(l=1) ^(i). To prove that {u_(l)v_(l) ^(T)} are linearly independent with each other, it is only necessary to show the rank-one matrix obtained in the next iteration is linearly independent of all rank-one matrices in the previous iterations. u_(i+1)v_(i+1) is linearly independent of {u_(l)v_(l) ^(T)}_(l=1) ^(i) by contradiction. Suppose that u_(i+1)v_(i+1) ^(T) is linearly dependent of {u_(l)v_(l) ^(T)}_(l=1) ^(i). Then, it can be represented by the linear combination of {u_(l)v_(l) ^(T)}_(l=1) ^(i), i.e., u_(i+1)v_(i+1) ^(T)=Σ_(l=1) ^(i) α_(l)u_(l)v_(l) ^(T) with α_(l) ∈

. The inner product

$\begin{matrix} {\left\langle {{u_{i + 1}v_{i + 1}^{T}},R_{k}^{i}} \right\rangle = {{\sum\limits_{l = 1}^{i}{\alpha_{l}\left\langle {{u_{l}v_{l}^{T}},R_{k}^{i}} \right\rangle}} = 0}} & (83) \end{matrix}$ implies u_(i+1) ^(T)R_(k) ^(i)v_(i+1)=0 for k=1, . . . , K as well as Σ_(k=1) ^(k) (u_(i+1) ^(T)R_(k) ^(i)v_(i+1))²=0. According to (20), the GP and OGP select (u_(i+1),v_(i+1)) such that

$\begin{matrix} {\left( {u_{i + 1},v_{i + 1}} \right) = {\arg\underset{{u} = {{v} - 1}}{\;\max}{\sum\limits_{k = 1}^{K}{\left( {u^{T}R_{k}^{i}v} \right)^{2}.}}}} & (84) \end{matrix}$

But Σ_(k=1) ^(K) (u_(i+1) ^(T)R_(k) ^(i)v_(i+1))²=0 attains the minimum due to the assumption that u_(i+1)v_(i+1) ^(T) is linearly dependent of {u_(l)v_(l) ^(T)}_(l=1) ^(i). This case only happens when all residual matrices {R_(k) ^(i)}_(k=1) ^(K) vanish and the reconstruction error becomes zero. In this case, the algorithm has already converged and terminated. Otherwise, it contradicts.

{u_(l)v_(l) ^(T)}, or {b_(l)=vec(u_(l)v_(l) ^(T))}_(l) are linearly independent with each other provided that any residual matrix does not vanish. After i=mn iterations, B_(i) contains mn linearly independent columns which span the whole space of

^(mn). Then, the residual r_(k) ^(i)=−Π_(i) ^(⊥)a_(k)=0 due to Π_(i) ^(⊥)=0, which also implies R_(k) ^(i)=0 and Σ_(k=1) ^(K)∥R_(k) ^(i)∥_(F) ²=0 after at most mn iterations.

In many practical applications, mn is usually very large. Generally, a target rank r<<m, is enough to capture the low-rank structure and achieves a small reconstruction error.

The inventors performed experiments to evaluate the performances and advantages of the method for processing electronic data in accordance with embodiments of the present invention. In the experiments, the simulation and experiments were conducted using a computer with a 2.2 GHz CPU and 4 GB memory. In addition to synthetic random data, the following three example databases, including two face datasets, and one object dataset, are used in the experiments.

In a first experiment performed, the ORL face database was processed and analysed. The database consists of 10 different images for each of 40 distinct subjects, and thus there are a total of 400 images. The resolution of the gray-scale images is 112×92, indicating that m=92, n=112, and K=400.

In a second experiment performed, the Georgia Tech face database was processed and analysed. The database contains 750 images of 50 individuals. There are 15 images for each individual. As the original images are colored and with different sizes, the images were converted to gray-scale with the same dimensions of 111×156, corresponding to m=150, n=111, and K=750.

In a third experiment performed, the MNIST database of handwritten digits was processed and analysed. The database is composed of images of digits 0 to 9 written by 500 different people. There are 70000 images with dimensions of 28×28 in total while a smaller subset of 2000 samples were selected, which results in m=n=28 and K=2000.

In a fourth experiment, the COIL-20 database was processed and analysed. The database includes 1440 gray-scale images of 20 different objects, which corresponds to 72 images per object. The image resolution is 128×128 and it follows m=n=128 and K=1440.

The convergence behaviors are investigated using synthesized random data. The parameters are set as m=100, n=120, and K=15. A set of noise-free matrices of rank 10 is generated by A_(k)=U S_(k)V^(T), k=1, . . . , K, where the entries of U∈

^(m×10) and V∈

^(10×n) satisfy the standard Gaussian distribution while the diagonal entries of S_(k) are uniformly distributed in [1,2] to avoid any diagonal entry being too close to zero.

With reference to FIG. 2, there is shown a plot showing the NRE of (14) versus iteration number for noise-free case. It may be observed that the reconstruction error rapidly decreases when the iteration number is not larger than the rank. The reconstruction error approaches zero as the iteration proceeds. The OGP converges fastest while the EGP converges slowest, although all of them have linear convergence rates. Then, the noise matrices N_(k) whose entries are independent and identically distributed Gaussian with variance σ_(n) ² are added to A_(k). The signal-to-noise ratio (SNR) is defined as SNR=(Σ_(k=1) ^(K)∥A_(k)∥_(F) ²)/(mnKσ_(n) ². In the presence of noise, the oracle bound of the NRE is dominated by the noise level.

With reference to FIG. 3, it is shown that the NRE as well as the oracle bound versus iteration number at SNRs of 10, 20, and 30 dB. The oracle bounds equal the reciprocal of the SNRs. It may be observed that the reconstruction error rapidly decreases when i≤10. After approaching the oracle bounds, the NREs decrease slowly. This means that the greedy methods have captured the dominating low-rank structure and the iterative procedure can be stopped since little improvement is attained due to the impact of the noise.

With reference to FIG. 4, it is shown that the NRE versus iteration number on the four example image databases. It is observed that the three greedy algorithms significantly decrease the reconstruction error at the beginning stage. This implies that these real-world images exhibit several “principal components” and the proposed methods successfully find these components although the data are not strictly of low-rank. The maximum iteration numbers r are set to 149, 231, 53, and 378 for the ORL face, Georgia Tech face, MNIST, and COIL-20 databases, respectively. To achieve a sufficiently small reconstruction error, the maximum iteration number r needs to be larger than min(m,n) but r<<mn. Again, the OGP has the fastest convergence rate while the EGP is the slowest.

In accordance with an embodiment of the present invention, the system for processing electronic data may be used for processing digital image data. In this embodiment, the method for processing digital image data comprises the steps of: transforming the digital image data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices; and constructing at least one digital image based on the low-rank approximation of the plurality of matrices.

The digital image data may represent at least two source images with noise and/or defect.

In another experiment performed by the inventors, the performances of an image reconstruction process of the GP/EGP/OGP method in accordance with the present invention are compared with the PCA, 2DPCA, and GLARM methods in the presence of outliers.

For fair comparison, the NREs of the six methods are computed under the same (or close) compression ratios. According to Table 1, r₁, r₂, r₃, and r₄, which denote the target ranks of PCA, 2DPCA, GLARM, and GP/EGP/OGP, respectively, should satisfy

$\begin{matrix} {{\left( {{mn} + K + 1} \right)r_{1}} = {{\left( {{mK} + n} \right)r_{2}} = {{\left( {m + n + {Kr}_{3}} \right)r_{3}} = {\left( {m + n + K} \right)r_{4}}}}} & (85) \end{matrix}$ to achieve comparable compression ratios in the six methods. Noting that r₁, . . . , r₄ are positive integers, (85) may not be strictly satisfied. The positive integers may be selected such that the compression ratios are as close as possible.

With reference to FIGS. 5 to 8, there is shown representative examples of the reconstructed images obtained by the six algorithms on the ORL face, Georgia Tech face, MNIST, and COIL-20 databases, respectively. The corresponding samples of the original images are also included for comparison. Obviously, the three greedy algorithms have much smaller reconstruction error than the PCA and 2DPCA under similar compression ratios. In other words, the greedy methods can achieve higher compression ratio under the same reconstruction error. This is because the reconstruction error monotonically increases with the compression ratio. In general, the GP, OGP and GLRAM have comparable performance. Note that for the COIL-20 database, the NRE of the OGP, 9.70×10⁻³, is moderately smaller than that of the GLRAM, 1.22×1.0⁻², while their compression ratios are nearly the same.

With reference to FIG. 5, the compression ratios of the PCA, 2DPCA, GLARM, and EGP/GP/OGP are 48.1, 45.9, 44.3, and 45.8 corresponding to the ranks of 8, 2, 15, and 149. The NREs of the above six methods are 4.56×10⁻², 5.02×10⁻², 1.63×10⁻², 2.75×10⁻², 1.82×10⁻², and 1.66×10⁻², respectively.

With reference to FIG. 6, the compression ratios of the PCA, 2DPCA, GLARM, and EGP/GP/OGP are 55.3, 55.4, 58.7, and 55.5 corresponding to the ranks of 13, 2, 17, and 230. The NREs of the above six methods are 5.51×10⁻², 6.69×10⁻², 1.82×10⁻², 3.15×10⁻², 1.94×10⁻², and 1.72×10⁻², respectively. A compression ratio of 55.5 implies that only 1.8% storage space of the original data is needed after compression.

With reference to FIG. 7, the compression ratios of the PCA, 2DPCA, GLARM, and EGP/GP/OGP are 14.1, 14.0, 15.9, and 13.9 corresponding to the ranks of 40, 2, 7 and 55. The NREs of the above six methods are 0.129, 0.363, 0.15, 0.218, 0.128, and 0.117, respectively.

With reference to FIG. 8, the compression ratios of the PCA, 2DPCA, GLARM, and EGP/GP/OCP are 31.3, 32.0, 33.3, and 32.0 corresponding to the ranks of 22, 4, 22, and 378. The NREs of the above six methods are 3.45×10⁻², 4.63×10⁻², 1.22×10⁻², 2.29×10⁻², 1.17×10⁻², and 9.70×10⁻³, respectively.

The inventors also investigated how the reconstruction error varies with the compression ratio. With reference to FIG. 9, there is shown a plot of the NREs of the six methods versus compression ratio for the four image databases. The reconstruction errors of all GP/EGP/OGP methods monotonically increase with the compression ratio. The OGP has the best reconstruction performance for all databases. The GP and GLRAM have similar performance as the OGP. Despite the low computational cost of the 2DPCA, its performance is not satisfactory because it only uses a single-sided transform, resulting in limited compression capability. Note that for the MNIST database, the performance of the PCA is good and comparable to the GLRAM, GP, and OGP, where mn=788 is less than K=2000. For other three databases where m>>K, the performance of the PCA has a large gap compared with the GLRAM and greedy methods. Therefore, for high-dimensional data with very large mn, the advantage of 2D based methods over the vectorization based methods is more evident.

With reference to FIG. 10, there is shown a plot of the recognition rate versus compression ratio on the ORL face database. In this experiment, 60% samples of each class are randomly selected as training samples while the remaining 40% are used as test samples. The six methods learn the low-rank structure using the training set and then the NN classifier is employed to the test samples. It may be observed that the performance of the PCA is the worst for classification. The performances of the other five methods are similar for small compression ratios. For large compression ratios, the 2DPCA becomes slightly worse. It is also seen that the recognition performance is not so sensitive to the compression ratio or reduced dimension.

These embodiments may be advantageous in that a matrix approximation method including three variants, namely, GP, EGP and OGP is provided, for learning the common low-rank structure of multiple matrices, which is an extension of the single matrix case. Unlike the existing non-diagonal decompositions, the method realizes a non-orthogonal but joint diagonal decomposition of multiple matrices, which allows a more parsimonious representation and a higher compression ratio.

Advantageously, the GP works in an iterative manner. At each iteration, it finds a rank-one approximation of the residuals. For GP and OGP, an alternating optimization method is devised for the rank-one fitting problem while for the EGP, just an approximate solution is employed to reduce the complexity. To accelerate the convergence, the OGP re-computes the weights of the basis matrices, where least squares orthogonalization is recursively solved.

It has been proven that the reconstruction error of each of the GP, EGP and OGP decays exponentially. The lower bound of the convergence rate or decay factor of the GP and EGP is derived. In addition, the finite convergence of the OGP is proved. In addition, it is shown that OGP converges faster than the GP. The acceleration factor of the OGP over GP/EGP is dominated by the angle between the current iterate and the subspace spanned by the previous iterates.

All of the GP/EGP/OGP methods are scalable to the problem size and computationally more efficient than the celebrated PCA since the method directly deals with the 2D matrices. When compared with other 2D based approaches such as 2DPCA and GLRAM, the GP, EGP and OGP achieve a joint diagonal decomposition for multiple matrices, and hence have a higher compression ratio given the same target rank. In other words, the greedy algorithms achieve smaller reconstruction errors under the same compression ratio.

Furthermore, the experimental results demonstrate the superiority of the GP methods over the PCA, 2DPCA and GLRAM, in terms of computational simplicity, convergence speed and reconstruction accuracy.

It will also be appreciated that where the methods and systems of the present invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilised. This will include stand alone computers, network computers and dedicated hardware devices. When the terms “computing system” and “computing device” are used, these terms are intended to cover any appropriate arrangement of computer hardware capable of implementing the function described.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated. 

The invention claimed is:
 1. A method for processing digital image data, comprising the steps of: transforming the digital image data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices; and constructing at least one digital image based on the low-rank approximation of the plurality of matrices, wherein the approximation process includes: a greedy pursuit approximation process, and a linear regression process, and further includes: determining a best rank-one approximation of a plurality of residual matrices in each of a plurality of iteration steps in the approximation process; and subtracting a plurality of rank-one basis matrices from the plurality of residual matrices.
 2. The method for processing digital image data in accordance with claim 1, wherein the digital image data are digitally compressed data.
 3. The method for processing digital image data in accordance with claim 1, wherein the digital image data represent at least one source image with noise and/or defect.
 4. The method for processing digital image data in accordance with claim 1, further comprising the step of processing the digital image data using a feature extraction process for classification of the digital image data.
 5. The method for processing digital image data in accordance with claim 1, further comprising the step of representing a solution associated with the matrix representation and the K residual matrices as (u_(i),v_(i),s^(i)) and {R_(k) ^(i)}_(k=1) ^(K), respectively at the ith iteration step.
 6. The method for processing digital image data in accordance with claim 5, wherein in the ith iteration step, the rank-one approximation of {R_(k) ^(i−1)}_(k=1) ^(K) is determined.
 7. The method for processing digital image data in accordance with claim 6, further comprising the step of determining coefficients {s_(k,l)} of the matrix representation (u_(i),v_(i),s^(i)) at the ith iteration.
 8. The method for processing digital image data in accordance with claim 7, wherein the coefficients {s_(k,l)} are determined by a least squares method.
 9. The method for processing digital image data in accordance with claim 8, wherein the coefficients {s_(k,l)} are determined after obtaining {u_(l)}_(l=1) ^(i) and {v_(l)}_(l=1) ^(i) at the ith iteration.
 10. The method for processing digital image data in accordance with claim 7, wherein the approximation process includes an orthogonal greedy pursuit approximation process.
 11. A method for processing digital image data, comprising the steps of: transforming the digital image data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices; and constructing at least one digital image based on the low-rank approximation of the plurality of matrices; wherein the approximation process includes a greedy pursuit approximation process; wherein the approximation process includes a linear regression process; wherein the approximation process further includes a rank-one fitting process for obtaining the low-rank approximation of the plurality of matrices; and wherein the low-rank approximation of the plurality of matrices are determined to be a principal singular value of each of the plurality of matrices having a maximum spectral norm.
 12. The method for processing digital image data in accordance with claim 11, wherein the approximation process includes an economic greedy pursuit approximation process.
 13. The method for processing digital image data in accordance with claim 11, wherein the digital image data are digitally compressed data.
 14. The method for processing digital image data in accordance with claim 11, wherein the digital image data represent at least one source image with noise and/or defect.
 15. The method for processing digital image data in accordance with claim 11, further comprising the step of processing the digital image data using a feature extraction process for classification of the digital image data.
 16. A method for processing digital image data, comprising the steps of: transforming the digital image data to a matrix representation including a plurality of matrices; decomposing the matrix representation into a series of matrix approximations; processing, with an approximation process, the plurality of matrices thereby obtaining a low-rank approximation of the plurality of matrices; and constructing at least one digital image based on the low-rank approximation of the plurality of matrices; wherein the step of decomposing the matrix representation into a series of matrix approximations includes a non-orthogonal decomposition of the plurality of matrices.
 17. The method for processing digital image data in accordance with claim 16, wherein the non-orthogonal decomposition includes a joint diagonal decomposition.
 18. The method for processing digital image data in accordance with claim 16, wherein the digital image data are digitally compressed data.
 19. The method for processing digital image data in accordance with claim 16, wherein the digital image data represent at least one source image with noise and/or defect.
 20. The method for processing digital image data in accordance with claim 16, further comprising the step of processing the digital image data using a feature extraction process for classification of the digital image data. 